Run auto-tagging only if no documents found by OCR by SCDevel · Pull Request #918 · icereed/paperless-gpt

SCDevel · 2026-03-05T06:23:23Z

ISSUE

Currently if someone AUTO_TAG's and AUTO_OCR_TAG's some documents and then waits a short time (ex. 15 seconds) then TAGS some more documents there is potential that the second batch of documents will not end up being OCR'd until after tagging is complete. This would be fine if LLM startup times were not potentially really slow. (I initially started hosting the Models on a HHD, which could take minutes to startup)

SOLUTION

If OCR has processed documents, skip Tagging and check OCR again. This will recheck for new documents,
Which in theory will keep the OCR Model alive, preventing the slow starting and stopping behavior.
Also this will help with situations where users are mass uploading files with a workflow to set the auto tag's on consumption (or using the folder tags feature).

FLAW

If the document is processed fast enough it could potentially miss this second check.
maybe a delay could help. it would probably still be faster even with a 10-20s delay.

TESTING

I have not tested this, as it is a relatively small feature. I also don't honestly know how the Models are started and stopped so this potentially may not even fundamentally work, but I figured regular maintainers would know that better then me.

Summary by CodeRabbit

Bug Fixes
- Optimized document processing by preventing redundant auto-tagging when OCR has already successfully processed documents. Auto-tagging now runs conditionally to improve system efficiency.

This is to help prevent a task-switching overhead that comes from the potentially long startup times of models.

coderabbitai · 2026-03-05T06:23:41Z

📝 Walkthrough

Walkthrough

The auto-tagging step in background.go now executes conditionally—only when OCR produces no documents. Previously, auto-tagging always ran after OCR regardless of its results. This optimization prevents redundant processing and improves efficiency by skipping unnecessary auto-tagging operations.

Changes

Cohort / File(s)	Summary
Background Processing Control Flow `background.go`	Made auto-tagging conditional on OCR results; auto-tagging only executes if OCR returns zero documents, avoiding redundant processing. Control flow now checks OCR document count before proceeding to auto-tagging phase.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related issues

Feature Request: Re-validate tag before processing each document (OCR via LLMs (Experimental) #861: Addresses auto-OCR/auto-tagging control flow by proposing per-document tag revalidation; this PR's conditional auto-tagging aligns with the goal of avoiding redundant document processing.

Possibly related PRs

refactor: Improve auto-tagging process to skip OCR-tagged documents #227: Directly modifies auto-tagging control flow to prevent re-tagging of OCR-handled documents by adjusting processed counts.
feat: add support automatic OCR #75: Introduced automatic OCR and auto-tagging functionality in background processing; this PR refines that flow with conditional execution.

Poem

🐰 When OCR completes its hoppy scan,
No need for tagging—skip the plan!
Smart conditionals save the day,
Processing cleaner, the rabbit way! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main control flow change: auto-tagging now only runs when OCR produces no documents, avoiding redundant processing.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…s empty

Run auto-tagging only if no documents found by OCR

71798bc

This is to help prevent a task-switching overhead that comes from the potentially long startup times of models.

ivanzud added a commit to ivanzud/paperless-gpt that referenced this pull request Mar 8, 2026

Merge upstream PR icereed#918: run auto-tagging only when OCR queue i…

d7f8584

…s empty

ivanzud mentioned this pull request Mar 8, 2026

Resolve unresolved upstream March OCR/Ollama items ivanzud/paperless-gpt#131

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run auto-tagging only if no documents found by OCR#918

Run auto-tagging only if no documents found by OCR#918
SCDevel wants to merge 1 commit into
icereed:mainfrom
SCDevel:patch-1

SCDevel commented Mar 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

SCDevel commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ISSUE

SOLUTION

FLAW

TESTING

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SCDevel commented Mar 5, 2026 •

edited

Loading

coderabbitai Bot commented Mar 5, 2026 •

edited

Loading